This data set contains 4,898 white wines with 11 variables on quantifying the chemical properties of each wine.
At least 3 wine experts rated the quality of each wine, providing a rating between 0 (very bad) and 10 (very excellent).
We will explore to find out which chemical properties influence the quality of white wines. Also we will explore the relation within the chemical properties.
## [1] FALSE
## X fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 1 7.0 0.27 0.36 20.7 0.045
## 2 2 6.3 0.30 0.34 1.6 0.049
## 3 3 8.1 0.28 0.40 6.9 0.050
## 4 4 7.2 0.23 0.32 8.5 0.058
## 5 5 7.2 0.23 0.32 8.5 0.058
## 6 6 8.1 0.28 0.40 6.9 0.050
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1 45 170 1.0010 3.00 0.45 8.8
## 2 14 132 0.9940 3.30 0.49 9.5
## 3 30 97 0.9951 3.26 0.44 10.1
## 4 47 186 0.9956 3.19 0.40 9.9
## 5 47 186 0.9956 3.19 0.40 9.9
## 6 30 97 0.9951 3.26 0.44 10.1
## quality
## 1 6
## 2 6
## 3 6
## 4 6
## 5 6
## 6 6
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1 Min. : 3.800 Min. :0.0800 Min. :0.0000
## 1st Qu.:1225 1st Qu.: 6.300 1st Qu.:0.2100 1st Qu.:0.2700
## Median :2450 Median : 6.800 Median :0.2600 Median :0.3200
## Mean :2450 Mean : 6.855 Mean :0.2782 Mean :0.3342
## 3rd Qu.:3674 3rd Qu.: 7.300 3rd Qu.:0.3200 3rd Qu.:0.3900
## Max. :4898 Max. :14.200 Max. :1.1000 Max. :1.6600
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.600 Min. :0.00900 Min. : 2.00
## 1st Qu.: 1.700 1st Qu.:0.03600 1st Qu.: 23.00
## Median : 5.200 Median :0.04300 Median : 34.00
## Mean : 6.391 Mean :0.04577 Mean : 35.31
## 3rd Qu.: 9.900 3rd Qu.:0.05000 3rd Qu.: 46.00
## Max. :65.800 Max. :0.34600 Max. :289.00
## total.sulfur.dioxide density pH sulphates
## Min. : 9.0 Min. :0.9871 Min. :2.720 Min. :0.2200
## 1st Qu.:108.0 1st Qu.:0.9917 1st Qu.:3.090 1st Qu.:0.4100
## Median :134.0 Median :0.9937 Median :3.180 Median :0.4700
## Mean :138.4 Mean :0.9940 Mean :3.188 Mean :0.4898
## 3rd Qu.:167.0 3rd Qu.:0.9961 3rd Qu.:3.280 3rd Qu.:0.5500
## Max. :440.0 Max. :1.0390 Max. :3.820 Max. :1.0800
## alcohol quality
## Min. : 8.00 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.40 Median :6.000
## Mean :10.51 Mean :5.878
## 3rd Qu.:11.40 3rd Qu.:6.000
## Max. :14.20 Max. :9.000
## 'data.frame': 4898 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile.acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric.acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual.sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free.sulfur.dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total.sulfur.dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : int 6 6 6 6 6 6 6 6 6 6 ...
Our dataset consists of 11 Input variables and 1 Output variable, with 4898 observations.
See the discribution of output variable (quality).
We can see most white wine in this dataset has the score of quality in 5, 6 or 7.
Calculate the number of samples in each quality below.
## Group.1 number_of_sample
## 1 3 20
## 2 4 163
## 3 5 1457
## 4 6 2198
## 5 7 880
## 6 8 175
## 7 9 5
See the discribution of input variable (quality).
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.800 6.300 6.800 6.855 7.300 14.200
The median of fixed acidity is 6.8 and the mean is 6.855 and this distribution is bell-curve shaped, so we can say fixed acidity data is normally distributed.
Calculate the number of samples in each quality.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0800 0.2100 0.2600 0.2782 0.3200 1.1000
The median of vilatile.acidity is 0.260 and the mean is 0.278. This distribution is a little bit right skewed but also bell-curve shaped.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2700 0.3200 0.3342 0.3900 1.6600
This distribution is well bell-curve shaped, but there is a small difference between the median(0.320) and the mean(0.334). Assumably, this is because there is an unusual peak at 0.5 (g/dm^3) and this value dragged the mean to higher value.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.700 5.200 6.391 9.900 65.800
The most of wine has residual sugar between 0 and 20 (g/dm^3). So we are going to create a graph that focus on the x scale below.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.700 5.200 6.391 9.900 65.800
This distribution is fairly right skewed. The median is 5.200 and the mean is 6.391. Many white wines fall into especially between 1 and 2 (g/dm^3) of residual sugar.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00900 0.03600 0.04300 0.04577 0.05000 0.34600
The median is 0.0430 and the mean is 0.0458. Even though there is some outliers betwenn 0.1 and 0.35, this distribution is fairly bell-curved.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 23.00 34.00 35.31 46.00 289.00
The median is 34.00 and the mean is 35.31. Even though there is some outliers betwenn 100 and 300, this distribution is fairly bell-curved.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.0 108.0 134.0 138.4 167.0 440.0
The median is 134.0 and the mean is 138.4. Even though there is some outliers betwenn 300 and 450, this distribution is fairly bell-curved.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9917 0.9937 0.9940 0.9961 1.0390
The median and the mean are almost same value, which is 0.9937 and 0.9940 respectively. There are quite few outliers observesd in density. This distribution is fairly bell-curved.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.090 3.180 3.188 3.280 3.820
The median and the mean are almost same value, which is 3.180 and 3.188 respectively, also this distribution is fairly bell-curved and normally distributed.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2200 0.4100 0.4700 0.4898 0.5500 1.0800
The median is 0.470 and the mean 0.490. This distribution is fairly bell-curved but a tiny bit right skewed.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.50 10.40 10.51 11.40 14.20
There is a peak between 9.25 and 9.5. The distribution is a little right skewed but the data spreads broadly.
There are 4898 white wine data in this dataset with 1 output variable and 11 onput variables.
The output is based on sensory data (median of at least 3 evaluations made by wint experts).
(worst) —–> (best)
quality: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Input variables (based on physicochemical tests):
The main feature is quality. I would like to figure out which input features affect the quality the most and whether it’s possible to predict the quality based by some input features.
All the input features (Fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates) possibly affect the wine quality.
No, I did not creat any new variables.
I noticed there is those two unusual distributions:
- The data of residel sugar is concentrated between 1 and 2 (g/dm^3).
- There is an unusual peak at 0.5 (g/dm^3) in citric acid data.
Also the lowest quality score is 3 and the highest is 9.
## [1] "fixed.acidity" "volatile.acidity" "citric.acid"
## [4] "residual.sugar" "chlorides" "quality"
## [1] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [4] "pH" "sulphates" "alcohol"
## [7] "quality"
From the subset of the data, the following can be said. - Between output(quality) and input varibales
- Higher the citric acid is, the quality tends to be higher.
- Lower the citric acid is, the quality tends to be higher.
- Lower the chlorides is, the quality tends to be higher.
- Lower the density is, the quality tends to be higher.
- Higher the pH is, the quality tends to be higher.
- More than 6 of quality, higher the alcohol is, the quality tends to be higher.
We will take a look into these relationships by seeing mean data (darkred plot in the graphs) and boxplots.
## pf$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2100 0.2575 0.3450 0.3360 0.3850 0.4700
## --------------------------------------------------------
## pf$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1900 0.2900 0.3042 0.4000 0.8800
## --------------------------------------------------------
## pf$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2400 0.3200 0.3377 0.4100 1.0000
## --------------------------------------------------------
## pf$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.270 0.320 0.338 0.380 1.660
## --------------------------------------------------------
## pf$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0100 0.2800 0.3100 0.3256 0.3600 0.7400
## --------------------------------------------------------
## pf$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0400 0.2800 0.3200 0.3265 0.3600 0.7400
## --------------------------------------------------------
## pf$quality: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.290 0.340 0.360 0.386 0.450 0.490
Between 7 and 9 of the quality, the amount of citric acid gets higher. But in the other qualities, the citric acid does not have any trend to increase or decrease the quality by its amount.
## pf$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.587 4.600 6.393 10.700 16.200
## --------------------------------------------------------
## pf$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.300 2.500 4.628 7.100 17.550
## --------------------------------------------------------
## pf$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.800 7.000 7.335 11.500 23.500
## --------------------------------------------------------
## pf$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.700 5.300 6.442 9.900 65.800
## --------------------------------------------------------
## pf$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.900 1.700 3.650 5.186 7.325 19.250
## --------------------------------------------------------
## pf$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.800 2.100 4.300 5.671 8.200 14.800
## --------------------------------------------------------
## pf$quality: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.60 2.00 2.20 4.12 4.20 10.60
The amount of residual sugar seems a little correlated to the quality. From this plot, we could say as the residual sugar decreases the quality increases all over the data. But between 4 - 5 and 7 - 8 of the quality, as the residual sugar increase, the quality increases. So to conclude this legitimacy, we would need to collect more data on wine which should be rated to 4, 8, 9 of quality.
## pf$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.02200 0.03625 0.04100 0.05430 0.05400 0.24400
## --------------------------------------------------------
## pf$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0130 0.0380 0.0460 0.0501 0.0540 0.2900
## --------------------------------------------------------
## pf$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00900 0.04000 0.04700 0.05155 0.05300 0.34600
## --------------------------------------------------------
## pf$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01500 0.03600 0.04300 0.04522 0.04900 0.25500
## --------------------------------------------------------
## pf$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01200 0.03100 0.03700 0.03819 0.04400 0.13500
## --------------------------------------------------------
## pf$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01400 0.03000 0.03600 0.03831 0.04400 0.12100
## --------------------------------------------------------
## pf$quality: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0180 0.0210 0.0310 0.0274 0.0320 0.0350
The quality is correlated relatively strong to the amount of chlorides. As the chlorides decreases, the quality increases.
## pf$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9911 0.9925 0.9944 0.9949 0.9969 1.0001
## --------------------------------------------------------
## pf$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9892 0.9926 0.9941 0.9943 0.9958 1.0004
## --------------------------------------------------------
## pf$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9872 0.9933 0.9953 0.9953 0.9972 1.0024
## --------------------------------------------------------
## pf$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9876 0.9917 0.9937 0.9940 0.9959 1.0390
## --------------------------------------------------------
## pf$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9906 0.9918 0.9925 0.9937 1.0004
## --------------------------------------------------------
## pf$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9903 0.9916 0.9922 0.9935 1.0006
## --------------------------------------------------------
## pf$quality: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9897 0.9898 0.9903 0.9915 0.9906 0.9970
As the density decrease, the quality increases. As I found below, density is related to the amount of alcohol, so we could say this tendency might just be influenced by the amount of alcohol.
## pf$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.870 3.035 3.215 3.188 3.325 3.550
## --------------------------------------------------------
## pf$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.830 3.070 3.160 3.183 3.280 3.720
## --------------------------------------------------------
## pf$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.790 3.080 3.160 3.169 3.240 3.790
## --------------------------------------------------------
## pf$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.080 3.180 3.189 3.280 3.810
## --------------------------------------------------------
## pf$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.840 3.100 3.200 3.214 3.320 3.820
## --------------------------------------------------------
## pf$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.940 3.120 3.230 3.219 3.330 3.590
## --------------------------------------------------------
## pf$quality: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.200 3.280 3.280 3.308 3.370 3.410
The quality is a little correlated to pH. As the pH value increases, the quality increases.
## pf$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.55 10.45 10.35 11.00 12.60
## --------------------------------------------------------
## pf$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.40 10.10 10.15 10.75 13.50
## --------------------------------------------------------
## pf$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.000 9.200 9.500 9.809 10.300 13.600
## --------------------------------------------------------
## pf$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.50 9.60 10.50 10.58 11.40 14.00
## --------------------------------------------------------
## pf$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.60 10.60 11.40 11.37 12.30 14.20
## --------------------------------------------------------
## pf$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.50 11.00 12.00 11.64 12.60 14.00
## --------------------------------------------------------
## pf$quality: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.40 12.40 12.50 12.18 12.70 12.90
The correlation between quality and alcohol is strong especially when the alcohol is more than 10.5%.
I would like to take a look on the relation between sulphates and quality, because sulphates is an additive, presumably it could affect the quality lower.
## pf$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2800 0.3800 0.4400 0.4745 0.5425 0.7400
## --------------------------------------------------------
## pf$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2500 0.3800 0.4700 0.4761 0.5400 0.8700
## --------------------------------------------------------
## pf$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2700 0.4200 0.4700 0.4822 0.5300 0.8800
## --------------------------------------------------------
## pf$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2300 0.4100 0.4800 0.4911 0.5500 1.0600
## --------------------------------------------------------
## pf$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2200 0.4100 0.4800 0.5031 0.5800 1.0800
## --------------------------------------------------------
## pf$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2500 0.3800 0.4600 0.4862 0.5850 0.9500
## --------------------------------------------------------
## pf$quality: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.360 0.420 0.460 0.466 0.480 0.610
From this plot, it seems there is almost no trend that affects the quality by sulphates.
Also from the ggpair praphs, regarding input variables:
- There are some moderate correlations, which are observed between free.sulfur.dioxide and total.sulfur.dioxide, density and total.sulfur.dioxide, and alcohol and density.
## [1] 0.615501
As the free sulfur deoxiside increases, the total sulfur dioxide increases. (Colleration coefficient = 0.6155)
## [1] 0.5298813
As the total sulfur deoxiside increases, the density increases. (Colleration coefficient = 0.5299) This implies the density of sulfur deoxiside compounds is higher than the other compounds in white wine.
## [1] -0.7801376
As the total alcohol increases, the density decreases. (Colleration coefficient = -0.7801)
The correlation between quality and alcohol is strong especially when the alcohol is more than 10.5%.
Also as the density decrease, the quality increases. As I found below, density is related to the amount of alcohol, so we could say this tendency might just be influenced by the amount of alcohol.
The amount of residual sugar seems a little correlated to the quality. Even thoughthe orders of residuak suger change between 4 - 5 and 7 - 8 of the quality, generally as the residual sugar decreases, the quality increases.
The quality is correlated relatively strong to the amount of chlorides. As the chlorides decreases, the quality increases.
The quality is correlated to pH. As the pH value increases, the quality increases.
We can say there was almost no influence on the quality by the amount of citric acid, because between the 3 and 8 of the quality, the values have no order or trend, even though the amount at quality 9 was higher than other values.
I found there is a positive correlation between the total sulfur deoxidide and free sulfur deoxidide, whose R^2 value was 0.616. According to the given information about this dataset, the total sulfur deoxidide contains free sulfur deoxidide, so it makes sense.
As the amount of total sulfur deoxidide increases, the density increases.
As the amount of alcohol increases, the density decreases. Assumably this is because the density of alcohol (ethanol) is smaller than the density of water.
This makes sense because density is a dependent variable that is changeable by its compounds such as water, alcohol and sulfur deoxidides.
Also, some wine data which are rated as 3, 4, 8, 9 are not as many as 5, 6, 7, so these small samples could be causing the wrong trend. Especially between the quality 3 and 5, the trend was opposite compared with the trend on the other qualities in some input valiables. (Residual sugar, Density, Alcohol)
The quality of white wine is the most strongly correlated to the amount of alcohol.
The amount of residue sugar, chlorides and pH are also correlated to the quality.
It is clear as the alcohol increases the quality gets higher, but not so clear in the residual sugar.
## <ScaleContinuousPosition>
## Range:
## Limits: 0 -- 1
It is still clear as the amount of chlorides decreases, the quality gets higher.
It is hard to see by this plot the correlation between the pH and the quality, even though by the box plot the correlation is clear.
We can see clearly as the density decrease, the quality gets higher.
There is a outlier around 67 in residual sugar, so we adjust the x-axis to close up the most of data. (It is also applied to residual sugar vs. pH graph.)
Even though the residual sugar data is spreaded over the quality, we can see the data of higher quality tends to exist in lower residual sugar.
There are some outliers in chlorides, so we adjust the x-axis to close up the most of data.
We can see when the pH is high and the amount of chlorides is low, the quality is high. Also, when the amount of chlorides is more than 0.10, all the pH data are less than 3.3.
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?
From the plot of the chlorides vs. alcohol, it is observed clearly that as the amount of chlorides decreases, the quality increases. Also it is very visible higher alcohol amounts have better quality.
From the graph of residual sugar vs. pH, Although it is a little, we can see that lower amount of residual sugar has higher qualities.
After seeing the graph of pH, I found thatpH does not have a strong correlation with the quality.
In the boxplot of residual sugar vs. quality, we could read higher pH had a higher quality, but in its scatter plot, it is difficult to read there is such a clear tendency.
No, I did not create any models.
The indicator of a quality is from 0 to 10, but actual data ranges from 3 to 9. The grade 3 has 20 data and the grade 9 only has 5 data. Most of the data is grade 6 (2198 data). The small sample numbers could limit the reliablity of the observation, because they could be biased.
From this graph, it is clear to see density and alcohol have a liner relation (corr = 0.78). The boudary of each quality by alcohol is more clear than the boundary by density. For example, between 8% - 10% of alcohol level the majority of the quality is 5, between 10% - 12%, the majority of the quality is 6, between 12% - 13%, the majority of the quality is 7, between 13% - 14%, the majority of the quality is 8. On the other hand, in the density axis, it is difficult to find a clear band that separates quality.
As the chlorides level is getting lower, the quality gets higher. While between 0.01 and 0.1 in chlorides, the quality varies in the range from 4 to 9, between 0.1 and 0.3 in chlorides, the most of quality are in the range from 3 and 6.
This data set contains information about 4898 white wines across 11 input attributes and 1 output attribute(quality).
We investigated the correlation between quality and input variables and between input variables.
There was a relatively strong relation between quality and alcohol, density, chlorides.
In other words, better quality white wines tend to have higher alcohol percentage, lower density and lower chlorides.
Also, alcohol and density has a liner relation, assumably it is because the density of ethanol is lower than water.
We have to be aware of that there are some limitations of this analysis:
First, We have limited samples. Especially the sample number of quality 3 and 9 is very small. To get more accurate insights from these dataset, we would need to collect more data for these qualities.
Second, there is no other information that could influence the quality such as grape types, wine brand, wine selling price, etc, due to privacy and logistic issues.